Querying Data and an Associated Ontology in a Database Management System

ABSTRACT

A method, apparatus, and computer program for querying data and an associated ontology in a database. An ontology is associated with data in database. Responsive to receiving a query from a requestor, relational data in the database is identified using the query to form identified relational data. Ontological knowledge in the ontology is identified using the identified relational data and the ontology. A result is returned to the requestor.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to data processing systems andin particular, the present invention relates to a method, apparatus andcomputer program product for querying data in a database. Still moreparticularly, the present invention relates to a method, apparatus, andcomputer program product for querying data and an associated ontology ina database management system.

2. Description of the Related Art

The term “data” generally refers to information that is highlystructured and has fixed relationships between the different pieces ofinformation, called the data elements. A set of data elements that arelogically related may be stored in a systematic way as a collection ofrecords in a computer, called a database. The logical relationshipsbetween the data elements allow the database to be queried andinformation extracted from the database. By querying the database, auser can extract meaningful information about the data elements. Thecomputer program used to manage and query a database is known as adatabase management system (DBMS).

The database management system manages the data based on therelationships between the data elements. A database management systemmanages the data by providing a way to perform various operations to thedata elements. The operations that may be performed to the data elementsin a database include adding data elements, removing data elements,modifying data elements, sorting data elements, and querying the dataelements. A database query typically contains one or more logical rules.In processing a query, the database management system extracts from thedatabase all the data elements which match the logical rules in thequery.

The term “ontology” generally refers to knowledge about the dataelements. A given set of data elements may have one or more associatedontologies. An ontology has characteristics that do not make it suitablefor storage in a database. For example, the knowledge in an ontology istypically less structured than the data elements. Therefore, an ontologyis typically not stored or managed by a database management system.

Currently, users can query data elements in a database using a databasemanagement system. However, users cannot query the ontology associatedwith the data elements in the same way because the ontology is notsuited for being stored in a database. Users also cannot query the dataelements and the ontology together to infer new knowledge.

Because the ontology contains valuable information about the dataelements, if the data elements and ontology could be linked and managedtogether, users could then formulate queries to infer new knowledgebased on the data elements and the ontology.

SUMMARY OF THE INVENTION

The different embodiments provide a method, apparatus, and computerprogram product for querying data and an associated in a database. Anontology is associated with data in a database. Responsive to receivinga query from a requestor, relational data in the database is identifiedusing the query to form identified relational data. Ontologicalknowledge in the ontology is identified using the identified relationaldata and the ontology. A result is returned to the requestor.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features believed characteristic of the invention are setforth in the appended claims. The invention itself, however, as well asa preferred mode of use, further objectives and advantages thereof, willbest be understood by reference to the following detailed description ofan illustrative embodiment when read in conjunction with theaccompanying drawings, wherein:

FIG. 1 depicts a pictorial representation of a network of dataprocessing systems, in which illustrative embodiments may beimplemented;

FIG. 2 is a block diagram of a data processing system, in whichillustrative embodiments may be implemented;

FIG. 3 is a block diagram of a user interaction with a databasemanagement system (DBMS), in accordance with an illustrativeembodiments;

FIG. 4 is a block diagram of a class hierarchy in a wine ontology inaccordance with an illustrative embodiment;

FIG. 5 depicts rules in a wine ontology, in accordance with anillustrative embodiment;

FIG. 6 depicts a class hierarchy for the locatedIn property inaccordance with an illustrative embodiment;

FIG. 7 is a diagram depicting database commands in accordance with anillustrative embodiment;

FIG. 8 is a diagram depicting a virtual view command in accordance withan illustrative embodiment;

FIG. 9 depicts commands to a hybrid relational-XML database inaccordance with an illustrative embodiment;

FIG. 10 is a block diagram depicting a user interaction with an ontologyrepository in accordance with an illustrative embodiment;

FIG. 11 is a block diagram depicting extracted information, inaccordance with an illustrative embodiment;

FIG. 12 is an example of code for constructing a Wine class hierarchy inaccordance with an illustrative embodiment;

FIG. 13 is a block diagram of a class and sample code is illustrated inaccordance with an illustrative embodiment;

FIG. 14 is an example of code for specifying transitive properties of aWine ontology in accordance with an illustrative embodiment;

FIG. 15 is an example of a conjunctive implication in accordance with anillustrative embodiment;

FIG. 16 is an example of a disjunctive implication in accordance with anillustrative embodiment;

FIG. 17 is a block diagram of an implication graph in accordance with anillustrative embodiment;

FIG. 18 is an example of a class hierarchy in accordance with anillustrative embodiment;

FIG. 19 is a flow diagram of an ontology processor in accordance with anillustrative embodiment;

FIG. 20 is a flow diagram for extracting a class hierarchy in accordancewith an illustrative embodiment;

FIG. 21 is a flow diagram for extracting transitive properties inaccordance with an illustrative embodiment;

FIG. 22 is a flow diagram for constructing an implication graph inaccordance with an illustrative embodiment;

FIG. 23 is a base table of wine products and an associated wine ontologyin accordance with an illustrative embodiment;

FIG. 24 is a flow diagram of a processor in accordance with anillustrative embodiment;

FIG. 25 is an example of a query in accordance with an illustrativeembodiment;

FIG. 26 is an example of a query in which illustrative embodiments maybe implemented; and

FIG. 27 is an example of a query in which illustrative embodiments maybe implemented.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

With reference now to the figures and in particular with reference toFIGS. 1-2, exemplary diagrams of data processing environments areprovided in which illustrative embodiments may be implemented. It shouldbe appreciated that FIGS. 1-2 are only exemplary and are not intended toassert or imply any limitation with regard to the environments in whichdifferent embodiments may be implemented. Many modifications to thedepicted environments may be made.

With reference now to the figures, FIG. 1 depicts a pictorialrepresentation of a network of data processing systems in whichillustrative embodiments may be implemented. Network data processingsystem 100 is a network of computers in which embodiments may beimplemented. Network data processing system 100 contains network 102,which is the medium used to provide communications links between variousdevices and computers connected together within network data processingsystem 100. Network 102 may include connections, such as wire, wirelesscommunication links, or fiber optic cables.

In the depicted example, server 104 and server 106 connect to network102 along with storage unit 108. In addition, clients 110, 112, and 114connect to network 102. These clients 110, 112, and 114 may be, forexample, personal computers or network computers. In the depictedexample, server 104 provides data, such as boot files, operating systemimages, and applications to clients 110, 112, and 114. Clients 110, 112,and 114 are clients to server 104 in this example. Network dataprocessing system 100 may include additional servers, clients, and otherdevices not shown.

In the depicted example, network data processing system 100 is theInternet with network 102 representing a worldwide collection ofnetworks and gateways that use the Transmission ControlProtocol/Internet Protocol (TCP/IP) suite of protocols to communicatewith one another. At the heart of the Internet is a backbone ofhigh-speed data communication lines between major nodes or hostcomputers, consisting of thousands of commercial, governmental,educational and other computer systems that route data and messages. Ofcourse, network data processing system 100 also may be implemented as anumber of different types of networks, such as for example, an intranet,a local area network (LAN), or a wide area network (WAN). FIG. 1 isintended as an example, and not as an architectural limitation fordifferent embodiments.

With reference now to FIG. 2, a block diagram of a data processingsystem is shown in which illustrative embodiments may be implemented.Data processing system 200 is an example of a computer, such as server104 or client 110 in FIG. 1, in which computer usable code orinstructions implementing the processes may be located for theillustrative embodiments.

In the depicted example, data processing system 200 employs a hubarchitecture including a north bridge and memory controller hub (MCH)202 and a south bridge and input/output (I/O) controller hub (ICH) 204.Processing unit 206, main memory 208, and graphics processor 210 arecoupled to north bridge and memory controller hub 202. Processing unit206 may contain one or more processors and even may be implemented usingone or more heterogeneous processor systems. Graphics processor 210 maybe coupled to the MCH through an accelerated graphics port (AGP), forexample.

In the depicted example, local area network (LAN) adapter 212 is coupledto south bridge and I/O controller hub 204 and audio adapter 216,keyboard and mouse adapter 220, modem 222, read only memory (ROM) 224,universal serial bus (USB) ports and other communications ports 232, andPCI/PCIe devices 234 are coupled to south bridge and I/O controller hub204 through bus 238, and hard disk drive (HDD) 226 and CD-ROM drive 230are coupled to south bridge and I/O controller hub 204 through bus 240.PCI/PCIe devices may include, for example, Ethernet adapters, add-incards, and PC cards for notebook computers. PCI uses a card buscontroller, while PCIe does not. ROM 224 may be, for example, a flashbinary input/output system (BIOS). Hard disk drive 226 and CD-ROM drive230 may use, for example, an integrated drive electronics (IDE) orserial advanced technology attachment (SATA) interface. A super I/O(SIO) device 236 may be coupled to south bridge and I/O controller hub204.

An operating system runs on processing unit 206 and coordinates andprovides control of various components within data processing system 200in FIG. 2. The operating system may be a commercially availableoperating system such as Microsoft® Windows® XP (Microsoft and Windowsare trademarks of Microsoft Corporation in the United States, othercountries, or both). An object oriented programming system, such as theJava™ programming system, may run in conjunction with the operatingsystem and provides calls to the operating system from Java programs orapplications executing on data processing system 200. Java and allJava-based trademarks are trademarks of Sun Microsystems, Inc. in theUnited States, other countries, or both.

Instructions for the operating system, the object-oriented programmingsystem, and applications or programs are located on storage devices,such as hard disk drive 226, and may be loaded into main memory 208 forexecution by processing unit 206. The processes of the illustrativeembodiments may be performed by processing unit 206 using computerimplemented instructions, which may be located in a memory such as, forexample, main memory 208, read only memory 224, or in one or moreperipheral devices.

The hardware in FIGS. 1-2 may vary depending on the implementation.Other internal hardware or peripheral devices, such as flash memory,equivalent non-volatile memory, or optical disk drives and the like, maybe used in addition to or in place of the hardware depicted in FIGS.1-2. Also, the processes of the illustrative embodiments may be appliedto a multiprocessor data processing system.

In some illustrative examples, data processing system 200 may be apersonal digital assistant (PDA), which is generally configured withflash memory to provide non-volatile memory for storing operating systemfiles and/or user-generated data. A bus system may be comprised of oneor more buses, such as a system bus, an I/O bus and a PCI bus. Of coursethe bus system may be implemented using any type of communicationsfabric or architecture that provides for a transfer of data betweendifferent components or devices attached to the fabric or architecture.A communications unit may include one or more devices used to transmitand receive data, such as a modem or a network adapter. A memory may be,for example, main memory 208 or a cache such as found in north bridgeand memory controller hub 202. A processing unit may include one or moreprocessors or CPUs. The depicted examples in FIGS. 1-2 andabove-described examples are not meant to imply architecturallimitations. For example, data processing system 200 also may be atablet computer, laptop computer, or telephone device in addition totaking the form of a PDA.

Introduction

The different embodiments provide a method, apparatus, and computerprogram product for querying data in a database. An ontology isassociated with the data. Responsive to receiving a query from arequestor, relational data in the database is identified using the queryto form identified relational data. Ontological knowledge in theontology is identified using the identified relational data and theontology. A result is returned to the requestor.

Most companies today have large amounts of data stored in relationaldatabases. Typically, this data has been gathered over many years. Thecompany may also have knowledge in a semi-structured form that does noteasily lend itself to being stored in relational form. It would beuseful for a company to be able to infer new information by storing andquerying both the data and knowledge about the data (ontology).

Of course, one way of storing and querying data and ontology is tomaterialize the ontology and store the resulting relationships. However,materializing the ontology reduces the ontology to relational data andremoves the advantages of keeping the ontology in a semi-structuredform.

Embodiments solve this problem by using the previously describedframework. Class hierarchies, implication rules, and transitiveproperties are extracted from the ontology and stored in extended markuplanguage, allowing the data and the ontology to be queried. The classhierarchies, implication rules, and transitive properties allow queriesto infer knowledge about the data that is not contained in therelational tables, while leaving the ontology in semi-structured form.

Moreover, using this framework, a user can create queries that inferknowledge in a manner similar to how the user would normally createqueries for relational data. This reduces any learning curve and allowsa user to create queries for inferring knowledge relatively quickly.

An ontology is a model of entities and relationships in a specificdomain of knowledge. An ontology that is associated with a set of dataelements is knowledge about the data, and is also known as domainknowledge. Domain knowledge is typically knowledge which is obtainedfrom humans who are experts in a particular area and then transformed byknowledge engineers into a set of entities and relationships.

A set of data elements contains one or more data elements. Currentdatabase management systems (DBMS) are not able to easily and seamlesslymanipulate both the data elements and the associated ontology. It wouldbe useful if the associated ontology could be managed similar to thedata, so that users could query the data, query the ontology, and queryinferences derived from the data and the ontology, similar to how usersquery the relational data.

The ability to query data, domain knowledge, and inferences derived fromthe data and domain knowledge, is called semantic data management. Inorder to support semantic data management in a database managementsystem, embodiments provide a framework for managing relational data andan associated ontology which bridges the gap between datarepresentation, knowledge representation and inferencing.

The different embodiments provide a method, apparatus, and computerprogram product for querying data in a database. An ontology isassociated with the data. Responsive to receiving a query from arequestor, relational data in the database is identified using the queryto form identified relational data. Ontological knowledge in theontology is identified using the identified relational data and theontology. A result is returned to the requestor.

Each ontology element has an associated class. The classes in anontology may be organized as a class hierarchy, in which the classes areorganized in a tree structure. In a class hierarchy, a class that isbelow inherits one or more properties from the class or classes aboveit. For example, in a wine ontology, the region where the wine is grownmay be a class. Thus, a wine grown in Mendocino, California may berepresented in a class hierarchy by showing Mendocino below Californiaand California below the United States.

An implication rule is a rule of logic showing the logical relationshipbetween ontology elements. For example, if a wine is of type Burgundy,then the wine is grown in the region of France. This logicalrelationship may be written as the following implication rule:

(type=Burgundy)

(region=France)

In a transitive relationship, if A is related to B, and B is related toC, then it logically follows that A is related to C. For example, ifMendocino is in California, and California is in the United States, thenit follows that Mendocino is in the United States.

Overview

With reference now to FIG. 3, a block diagram of a user interaction witha database management system (DBMS), in accordance with an illustrativeembodiment, is depicted. In user interaction with a database managementsystem 300, user 302 interacts with database management system DBMS 304.User 302 can perform various operations to DBMS 304, including creatingand sending a query to extract information from DBMS 304 and receivingthe results of the query from DBMS 304.

DBMS 304 provides a virtual view 306 of base table 308, which is aconventional relational data table, and ontology repository 310.Ontology repository 310 contains one or more ontologies associated withthe data in base table 308. The term “ontology” refers to knowledgeabout the data elements in the relational data table. The knowledge inan ontology is typically less structured than the data elements. Forexample, a wine database may contain information about each type ofwine, the price per bottle, and who makes it. The ontology may containinformation such as where the grapes are grown, and the color of thegrapes.

Virtual view 306 provides the user with a seamless and integrated viewof both the data in base table 308 and a set of ontologies in ontologyrepository 310. A set of ontologies contain one or more ontologies.Virtual view 306 may appear to the user as a conventional databasemanagement system, and so the user may not be aware that he or she isviewing both data and ontology together in the virtual view.

The virtual view is created when the user associates a subset of thedata elements in the relational data table with a subset of theontologies in ontology repository 310. Subset means that the dataelements in the virtual view are less than or equal to all the dataelements in the relational view, and the ontologies in the virtual vieware less than or equal to all the ontologies in ontology repository 310.

User 302 queries DBMS 304 using virtual view 306. Virtual view queryprocessor 312 receives the user's query, rewrites the query, and sendsthe rewritten query to query engine 314 for processing. Query engine 314may be a hybrid relational-XML query engine.

Query engine 314 receives the rewritten query, executes the query andobtains information, and then returns the information to the user. Theinformation obtained may be data from base table 308, knowledge fromontology repository 310, or an inference resulting from linking the datain base table 308 with the knowledge in ontology repository 310.

In a conventional database management system, user 302 sends a query toquery engine 314, query engine 314 extracts information that matches thequery from base table 308, and then sends the result of the query backto user 302. In an illustrative embodiment, ontology repository 310 isadded to a conventional database management system so that both the dataand the associated set of ontologies may be queried together. In anillustrative embodiment, query engine 314 is modified to handle both arelational data base and a set of ontologies stored as XML files.

In an illustrative embodiment, virtual view 306 is added to aconventional database management system so that user 302 can view bothdata elements from base table 308 and ontology elements from ontologyrepository 310. Virtual view query processor 312 is added to aconventional system so that user 302 can query base table 308 and theassociated ontologies in ontology repository 310 using the virtual view.

The different blocks in FIG. 3 are for purposes of illustration and notmeant to limit the manner in which different features of illustrativeembodiments may be implemented. The database management system frameworkshown in FIG. 3 extends a database management system to operate on notjust data, but also domain knowledge, so that inferences from the domainknowledge and data may be made. To insulate the user from the details ofthe representation of the domain knowledge, the user is presented with avirtual view, through which domain knowledge appears to be no differentfrom data. In this way, domain knowledge may be manipulated usingrelational operators that are fully incorporated and supported withinthe database management system. In addition, inferences may be madebased on the data and the domain knowledge using relational operators.

TABLE 1 ID Type Origin Maker Price 1 Burgundy CotesDOr ClosDeVougeot 302 Riesling NewZealand Corbans 20 3 Zinfandel EdnaValley Elyse 15

Table 1 is a base table, such as base table 308 in FIG. 3, containingrelational data for three wines. Each row in Table 1 is associated witha specific instance of a wine. Each wine has four attributes: type,origin, maker, and price. A conventional relational database managementsystem allows a user to query data about the wines using theseattributes. However, a user may only query and retrieve the datacontained in Table 1.

A human, on the other hand, has the ability to combine data withknowledge and create inferences. For example, if a wine connoisseur isasked which wine originates from the United States (U.S.), the wineconnoisseur might answer Zinfandel because its origin, EdnaValley, islocated in California. The information that EdnaValley is in California,and California is in the U.S., is not explicitly contained in the dataof Table 1, but instead belongs to the domain knowledge of geographicalregions.

Similarly, if asked which wine is a red wine, the wine connoisseur mightanswer Zinfandel and Burgundy, because the wine connoisseur knows thatZinfandel is red and that the Burgundy from Cotes D'Or is red. The wineconnoisseur knows that, even though Burgundy can be either red or white,Burgundy wines originating from Cotes D'Or are always red. However, thedomain knowledge needed to answer queries that involve an inference isnot present in the relational table.

The different embodiments recognize that the first step to answer aquery involving an inference to be made is to make the domain knowledgeaccessible to a computer by extracting information about the ontology,such as the ontology's class hierarchy. For example, a wine ontology mayconsist of a class hierarchy of objects, properties associated with eachobject class, and rules governing (a) the objects, (b) the properties ofthe objects, and (c) the values the properties may take.

Class Hierarchy

With reference now to FIG. 4, a block diagram of a class hierarchy in awine ontology is depicted in which illustrative embodiments may beimplemented. The class hierarchy is extracted from the wine ontology andstored in an ontology repository such as ontology repository 310 in FIG.3.

Class hierarchy in a wine ontology 400 shows the different types ofrelationships in a wine ontology. The terms subclass and superclass areused to convey information about the hierarchical relationship betweentwo classes. For example, in a class hierarchy, a class below anotherclass is sometimes called a subclass, while a class above another classis sometimes called a superclass.

In FIG. 4, thing 402 has a subclass potableLiquid 404. PotableLiquid 404has subclass wine 406. Wine 406 has multiple subclasses, includingburgundy 408 and riesling 410. Riesling 410 has two subclasses,dryRiesling 412 and sweetRiesling 414.

Class wine 406 inherits the property locatedIn 416 from superclass thing402. The property locatedIn 416 takes a value from the class region 418.Class wine 406 has associated with it five properties, hasSugar 420,hasBody 422, hasColor 424, hasMaker 426, and madeFromGrape 428.

Each property is associated with a range class, so that the values ofthe property are restricted to instances of the range class. Forexample, the property hasSugar 420 takes values that are instances ofthe wineSugar 430 class. Similarly, properties hasBody 422, hasColor424, hasMaker 426, and madeFromGrape 428 take values that are instancesof the classes wineBody 432, WineColor 434, Winery 436, and WineGrape438, respectively.

A class can subsume or be subsumed by other classes. For example, theclass Wine 406 subsumes the classes burgundy 408 and riesling 410.Similarly, dryRiesling 412 and sweetRiesling 414 are subsumed byriesling 410. The subsumption relationship creates a hierarchy ofclasses, typically with a general superclass such as thing 402 at thetop and very specific subclasses such as dryRiesling 412 at the bottom.

Implication Rules

With reference now to FIG. 5, rules in a wine ontology are depicted inwhich illustrative embodiments may be implemented. FIG. 5 providesexamples of implication rules extracted from a wine ontology. In presentembodiments, implication rules are stored as an implication graph in anontology repository, such as ontology repository 310 in FIG. 3.

In rules in a wine ontology 500, rule 502 prescribes that all instancesof wine in the CotesDOr class have moderate flavor. Rule 504 prescribesthat all instances of wine in the CotesDOr class are of type RedBurgundyand have their origin as CotesDOrRegion. Rule 506 prescribes that allinstances of wine of type RedBurgundy have type Burgundy and typeRedWine. Rule 508 prescribes that all instances of wine of typeRedBurgundy have PinotNoirGrape as the madeFromGrape.

Transitive Properties

With reference now to FIG. 6, a class hierarchy for the locatedInproperty is depicted, in which illustrative embodiments may beimplemented. FIG. 6 is an example of a class hierarchy extracted from awine ontology. The class hierarchy contains transitive properties and isstored in an ontology repository, such as ontology repository 310 inFIG. 3. The class hierarchy for the locatedIn property 600 shows thelocatedIn property for region object instances.

France 604, U.S. 606, Italy 608, and Germany 610 are countries locatedin the superclass World 602. Bourgogne 612 and Bordeaux 614 are regionslocated in France 604. California 616 and Texas 618 are regions locatedin U.S. 606. CotesDOr 620 and Mersault 622 are cities located in regionBourgogne 612. EdnaValley 624 and Mendocino 626 are cities located inregion California 616. Grapevine 628 is a city located in region Texas618.

By following the transitive properties, new inferences can be made. Forexample, CotesDOr 620 is in Bourgogne 612, and Bourgogne 612 is inFrance 604, so it can be inferred that CotesDOr 620 is in France 604.Similarly, it can be inferred that EdnaValley 624 is in U.S. 606.

The locatedIn property is a property of the thing 402 class in FIG. 4and takes values that are instances of the Region 418 class in FIG. 4.The wine ontology may specify that the locatedIn property is transitive,so that all the locatedIn relations on region instances form a tree (ora directed acyclic graph).

The domain knowledge shown in FIG. 4, FIG. 5, and FIG. 6 is knowledgeextracted from the wine ontology, and this extracted knowledge providesinformation that supplements the relational data in Table 1. However,the domain knowledge in FIG. 4, FIG. 5, and FIG. 6 is not in relationalform, and therefore a conventional relational database management systemcannot manage the knowledge extracted from the ontology.

Present embodiments recognize that it is desirable to be able to use adatabase management system to manage domain knowledge in addition tomanaging data. First, in many cases, the data already resides in thedatabase management system, and so the database management system isable to provide users with a wide range of transactional and analyticalcapabilities. Second, a declarative query language such as structuredquery language (SQL) can insulate users from the details of the datarepresentation.

In order for a user to be able to query a database management systemcontaining data and domain knowledge, present embodiments address twoissues. The first issue is storing and accessing the ontology. Thesecond issue is implementing knowledge inferencing, so that knowledgemay be inferred using data and ontology. Present embodiments solve boththese issues by providing a framework that allows a database managementsystem to query both data and domain knowledge.

Because ontology is structured differently than data, ontology istypically represented as semi-structured data and encoded using anXML-based language such as Resource Description Framework (RDF) or WebOntology Language (OWL). The relational data model is suited for datacontaining structured relationships, but is not suited for efficientlystoring or processing semi-structured data.

In contrast, the XML data model is better suited for representingsemi-structured data. However, XML's flexibility in modelingsemi-structured data comes at the cost of storage overhead and queryprocessing overhead, which is why a pure XML database is usually notdeployed to handle an ontology. Thus, there is a need to modelsemi-structured data in way that preserves the semi-structured form ofthe data and also allows the semi-structured data to be efficientlystored and efficiently queried. Knowledge inferencing, that is, derivinginferences from the data and the associated ontology, is highly complexas it uses many details of the ontology, such as the relationshipsbetween the data and the ontology. For example, an ontologicalrelationship may be transitive, and in fact, transitive relationshipsare often involved in many useful queries. However, a transitive queryis difficult to express and often costly, in terms of processingoverhead, to execute.

For example, in a relational database management system (RDBMS),transitive relationships may require the execution of a set of recursiveSQL queries. Recursive means that a given SQL query is repeatedly brokendown into additional SQL queries, typically with each successive queryoperating on a smaller set of entities.

To efficiently process ontology-based queries in a database managementsystem, one approach is to pre-process the ontology and materialize thetransitive closures for all transitive relationships in the ontology.Materialize means that all transitive relationships are discovered andstored. Transitive means if A=B, and B=C, then A=C. For example, tomaterialize the knowledge that EdnaValley 624 is in the U.S. 606, thetransitive relationships used are: (1) EdnaValley 624 is in California616, and (2) California 616 is in the U.S. 606.

The problem with this approach is that pre-processing all transitiverelationships in the ontology incurs a cost in terms of both time andstorage, because all transitive relationships have to be followed andthen stored. Furthermore, once all transitive closures are materialized,updating the ontology becomes more costly as any change to an ontologyrelationship may introduce significant changes that are not contained inthe materialized transitive relationships.

The database administrator is faced with the dilemma of having to incurthe cost of re-processing all transitive relationships or havingknowledge in the ontology that cannot be queried because the knowledgehas not been materialized. Thus, materializing the ontology defeats thepurpose of having an ontology that contains dynamic information andreduces the ontology to data with fixed relationships.

Creating a Framework to Query Data and Ontology

Thus neither a pure relational database management system nor a pureXML-based approach can easily implement a framework for allowing dataand ontology to be queried. To solve the problem of using a databasemanagement system to manage data and an ontology, embodiments extractspecific information from an ontology and use a hybrid relational-XMLdatabase management system, such as DBMS 304 in FIG. 3, to store data,the ontology and the extracted information. The framework allows a userto express and process ontology-based semantic queries.

To support ontology-based semantic queries, the relational databasemanagement system is augmented so that knowledge representation can beincorporated into the relational framework. Augmenting the relationaldatabase management system allows knowledge to be queried in a waysimilar to how data is queried. In other words, the user can create aquery, similar to a conventional relational query, which results ininferences based on the ontology.

The framework provides the user with a relational virtual view of boththe data and the domain knowledge, and allows the user to query the dataand the domain knowledge. The relational virtual view is a virtual viewsuch as virtual view 306 in FIG. 3. The relational virtual view iscreated by specifying how the data, encoded in relational tables, suchas base table 308 in FIG. 3, relates to the domain knowledge, encoded asone or more ontologies in an ontology repository, such as ontologyrepository 310 in FIG. 3. Once the data is integrated in this way withthe domain knowledge, new knowledge, such as inferences based on therelationships between the data and the ontology, may be derived.

The virtual view is an interface through which users may query data,domain knowledge, or derived knowledge in a seamless and unified manner.To provide the virtual view, embodiments use a database managementsystem capable of native XML support augmented with an ontologyrepository for managing ontological information.

Before an ontology can be used in the database management system, theontology's files are first registered with the ontology repository. Theontology files are then pre-processed into a representation moresuitable for query processing. Class hierarchies and transitiveproperties are extracted into trees, and implications are extracted intoimplication graphs. These trees and implication graphs are encoded andstored as XML data and used to create the virtual view. Once the virtualview is created, SQL queries may be written and executed as if thevirtual view was just another relational table.

TABLE 2 ID Type Origin Maker Price locatedIn hasColor 1 BurgundyCotesDOr ClosDeVougeot 30 {Bourgogne, France} Red 2 Riesling NewZealandCorbans 20 { } White 3 Zinfandel EdnaValley Elyse 15 {California, U.S.)Red

Table 2 shows a virtual view, such as virtual view 306 in FIG. 3, inwhich Table 1 has been augmented with two virtual columns, locatedIn andhasColor. The virtual view displays information from a base tablealongside related information extracted from an ontology. The first fivecolumns of Table 2, ID, Type, Origin, Maker, and Price, are taken from abase table such as base table 308 in FIG. 3. The two virtual columns,locatedIn and hasColor are taken from extracted ontology informationstored in an ontology repository, such as ontology repository 310 inFIG. 3.

LocatedIn consists of a set of locations, {y₁, y₂, . . . y_(n)}, where,for every wine of Origin x, x is a sub-region of y_(i). For example,wine Burgundy originates from CotesDOr, which is a sub-region (subclass)of Bourgogne, which in turn is a sub-region of France. Thus, byfollowing the transitive relationships in the region hierarchy, thevalue of locatedIn for Burgundy is found to be Bourgogne, France.

Similarly, the virtual column hasColor is derived from the wineontology. The ontology includes a set of implication rules, such as, forexample:

(type=Zinfandel)

(hasColor=red)(type=Riesling)

(hasColor=white)

The symbol

denotes that the left hand side (LHS) of the symbol

implies the right hand side (RHS), as in “A wine of type Zinfandelimplies the wine has the color red”. The symbol

may also be read as an “If . . . then . . . ” statement, as in “If awine is of type Zinfandel then the wine has the color red”. Thus, forwines of type Zinfandel, we can derive the value of hasColor to be red.

Any number of virtual columns may be appended to the original table,Table 1. The virtual view incorporates both the data and the domainknowledge associated with the data. However, because it is a virtualview, none of the values in the virtual columns are actuallymaterialized. Instead, the values in the virtual columns are derived(inferred) only when a query is made that requires that the values bederived.

The purpose of the virtual view is to (a) show the user what informationcan be queried and (b) provide the system with the relationships betweenthe data and the ontology needed to derive values. The system is able touse virtual columns to derive values from the raw data and the ontologywhen needed, in real time. A unified view of the data and the ontologymakes it relatively easy for users to make queries that manipulate boththe data and the ontology.

With reference now to FIG. 7, a diagram depicting database commands, inwhich illustrative embodiments may be implemented, is depicted. Query702 is a query against a virtual view, such as virtual view 306 in FIG.3. Query 702 finds all wines in the database which originate from theUnited States. Similarly, the query 704 is a query against the virtualview which finds all red wines in the database.

From a purely relational database standpoint, the schema shown in Table2 appears to violate relational normal forms because, for example, onelocation may be a sub-region of many other locations. For example, for agiven wine, the set of locatedIn values depends on the origin of thewine, and so the two columns titled “Origin” and “locatedIn” could beisolated and made into their own table.

However, the schema of Table 2 may violate the relational normal formbecause Table 2 is merely a virtual view. The virtual view allows adatabase user to query the data and the domain knowledge as if they werestored in relational tables.

For example, suppose there is a “knowledge” table called RegionKnowledge(region, superRegions), which stores, for each region, all super regionsas a set. Thus, (CotesDOr, {Bourgogne, France}) is an example of anentry in this knowledge table. From a user's perspective, the virtualview appears to be the result of joining the wine table with a“knowledge” table as shown in command 706.

It is important to note that command 706 is never actually executed tojoin the wine table with a knowledge table, but represents what thevirtual view shows the user. The virtual view does not in fact exist inthe system as a materialized table, because the data and ontology arenever “joined” together, even at the time the query is made. A virtualview is thus significantly different from a traditional view because the“knowledge” table used to create the view does not actually exist.

Instead, the system understands how to derive the values of the virtualcolumns from the values in the base table by, for example, reasoningover the ontology. Reasoning over the ontology means that values of thevirtual columns are filled automatically when a query is issued againstthe virtual view. The process of creating a virtual view informs thesystem as to how the values for the virtual columns are derived. This isdiscussed in more detail below.

Integrating Relational Tables and Ontology

Underlying the virtual view is the data and the associated ontology. Thedata is stored in relational tables while the ontology is stored in XML.When properly associated together, the data and the ontology may bequeried through the virtual view to produce new knowledge in the form ofinferences. The data and the ontology are associated using a CREATEVIRTUAL VIEW statement, one of the language extensions in theillustrative embodiments used to support semantic queries in a databasemanagement system.

With reference now to FIG. 8, a diagram depicting a virtual viewcommand, in which illustrative embodiments may be implemented, isdepicted. Virtual view command 800 creates a virtual view, such asvirtual view 306 in FIG. 3, which integrates a base table, such as basetable 308 in FIG. 3, with an ontology in an ontology repository, such asontology repository 310 in FIG. 3. FIG. 8 shows how the virtual view inTable 2 is created from Table 1 and an associated ontology.

A virtual view, such as virtual view 306, is registered with a databasemanagement system, such as DBMS 304 in FIG. 3. Here, in line 802, CREATEVIRTUAL VIEW is used to register the virtual view WineView with thedatabase management system. CREATE VIRTUAL VIEW associates a wine tablewith an ontology. After the virtual view is registered, a user, such asuser 302 in FIG. 3, may issue queries against the virtual views as ifthe data and ontology were in a relational table.

CREATE VIRTUAL VIEW statement is a type of join operation between thewine table and the ontology. One way to understand the join operation isto view the ontology hierarchy as a class hierarchy in anobject-oriented programming language, and view the join operation asusing data from the relational table to instantiate new objects.

The source of the virtual view WineView is the Wine table and theWineOntology, which are specified in the FROM clause in line 804. Theconstraints in the WHERE clause in line 806 specify how the wine tableand the wine ontology are integrated. The constraint O.object=W.type inline 806 instantiates an ontology object using the value of W.type.

For example, the first row in Table 1 is for a wine of type Burgundy.The constraint becomes O.object=‘Burgundy’. In line 808O.object.isA(‘Wine’) is true and so the line 808 requires that the newlyinstantiated object be an instance of the Wine class. This maps each rowof the wine table to an instance of Wine in the wine ontology.

Line 810 specifies that the origin column of the wine table correspondsto Burgundy's locatedIn attribute (which is inherited from class Thing).Line 812 specifies that the maker column corresponds to the wine'shasMaker attribute. Note that O.object.hasMaker is only meaningful whenO.object is an instance of the Wine class. Thus, the result of theCREATE VIRTUAL VIEW statement is a schema that includes two virtualcolumns, locatedIn and hasColor created from the associated ontology.

The SELECT in line 802 has three parameters. Item W.* indicates that theschema of the virtual view contains all the columns (Id, Type, Origin,Maker, Price) in the original wine table (Table 1). TC(O.object.locatedIn, ‘isubRegion’) in line 802 specifies another virtualcolumn whose value is computed by the transitive closure function TC.The transitive closure function expands a region upward along the‘subRegion’ relationship in the location ontology, resulting in a set oflocations that contain the region specified by O.object.locatedIn inline 802.

Item O.object.hasColor in line 802 specifies a virtual column based onan attribute or property of the wine object in the ontology. Theattribute value is derived using ontological rules at the time the queryis made. The registration of the virtual view creates a mapping betweenvalues in the relational table and the ontology, enabling the system toperform knowledge inferencing for queries against the virtual view.

The ontology is stored as semi-structured data in an ontologyrepository, such as ontology repository 310 in FIG. 3. As previouslydiscussed, a conventional relational database management system cannotdirectly handle semi-structured data. The framework uses a hybridrelational-XML database management system, such as DBMS 304 in FIG. 3,to provide physical level support for the ontology. Because XML is now astandard for data retrieval and exchange, some relational databasemanagement systems now support XML data in native form. For example,International Business Machines' (IBM) DB2™ Universal Database providesnative support for XML data.

The framework uses a hybrid relational-XML database management system,such as IBM's DB2™, where an existing relational database managementsystem has been extended using the following four components. First, anontology repository, such as ontology repository 310 in FIG. 3, is addedto provide native XML storage so that an XML document can be stored asan instance of the XQuery Data Model (QDM), that is, as a structured,typed, binary tree. Second, new index types for XML data are created,including structural indexes, value indexes, and full-text indexes.Third, a hybrid query processor, such as virtual view query processor312 in FIG. 3, is added to process queries formulated using XQuery andSQL. Fourth, an enhanced query engine, such as query engine 314 in FIG.3, is added to support XQuery and SQL/X operators.

In a hybrid relational-XML database management system, XML is supportedas a basic data type. Users can create a table with one or more XML typecolumns. A collection of XML documents can therefore be defined as acolumn in a table.

With reference now to FIG. 9, commands to a hybrid relational-XMLdatabase, in which illustrative embodiments may be implemented, aredepicted. Line 902 shows a command a user can use with a hybridrelational-XML database to create a table ClassHierarchy. Line 904 showssample code a user can use to insert an XML document into a table. TheXML document is parsed, placed into native XML storage, and indexed. TheSQL/X function, XMLParse, is used to insert an XML document into atable.

A user can query relational columns and XML columns together by issuinga SQL/X query. Line 906 is an example of a query which returns the classids and class names of all the class hierarchies that contain theXPath/Wine/DessertWine/SweetRiesling.

XMLExists is a SQL/X boolean function that evaluates an XPath expressionon an XML value. If XPath returns a nonempty sequence of nodes, thenXMLExists is true, otherwise, it is false.

Ontology Repository

In order to support ontologies in the database management system, thedatabase management system, such as DBMS 304, is augmented with anontology repository, such as ontology repository 310 in FIG. 3. Anontology repository consists of a collection of information associatedwith one or more ontologies, ontologies which a user has registered withthe ontology repository.

From the user's perspective, the ontology repository contains one ormore ontology files and their corresponding identifiers (ontIDs).Besides being a storage system for ontology files, the ontologyrepository also hides much of the complexity of the ontology-relatedprocessing from the user.

With reference now to FIG. 10, a block diagram depicting a userinteraction with an ontology repository, in which illustrativeembodiments may be implemented, is depicted. In user interaction with anontology repository 1000, user 1002 provides one or more ontology files1004 and an ontology identifier 1006 to ontology repository 1008.Ontology repository 1008 is an example of ontology repository 310 inFIG. 3.

Ontology processor 1010 registers ontology files 1004 to ontologyidentifier 1006 so that the user can later reference that specificontology. More than one ontology file may be registered to a specificontology identifier. Multiple sets of ontologies may be registered, witheach ontology having a unique ontology identifier. Ontology processor1010 performs various operations on ontology files 1004, includingextracting a variety of information from the ontology in order tofacilitate query processing.

For example, ontology processor 1010 may extract from ontology files1004 the ontology's class hierarchy 1012, transitive properties 1014,and implication graph 1016. Class hierarchy 1012, transitive properties1014, and implication graph 1016 are stored in ontology repository 1008.Ontology processor 1010 may extract additional information to, forexample, support specific query types, or to optimize query processing.Ontology processor 1010 stores the extracted information in ontologyrepository 1008. Ontology processor 1010 may also store the originalfiles, ontology files 1004, in ontology repository 1008.

With reference now to FIG. 11, a block diagram depicting extractedinformation, in which illustrative embodiments may be implemented, isdepicted. Extracted information 1100 shows an example of the three typesof ontology information that may be extracted and stored in an ontologyrepository, such as ontology repository 1008 in FIG. 10.

Here, the three types of extracted information are stored in threetables, OntologyDocs 1102, OntologyInfo 1104, TransitiveProperty 1106,in the ontology repository. OntologyDocs 1102 table stores a copy of theoriginal ontology files which the user registered. Tables OntologyInfo1104 and TransitiveProperty 1106 store additional information extractedfrom the ontology files. Some of the fields of each table may containpointers to XML representations of the documents or the extractedinformation.

Here, the extracted ontology information is shown stored in tables forillustration purposes. Those versed in the art will appreciate that anytype of data structure, that serves the same purpose as a table, may beused to store the extracted ontology information.

OntologyInfo 1104 table may contain various fields such as ontologyidentifier ontID 1108, class 1110, and imply 1112. Here, class 1110contains information on each class in an ontology, while imply 1112 hasfields containing information about the implications associated witheach class.

Similarly, TransitiveProperty 1106 has various fields, includingontology identifier ontID 1114, property identifier propID 1116, andtree 1118. PropID 1116 contains information about each property, whiletree 1118 is a field which contains a pointer to an XML treerepresentation of one of the transitive properties in the ontology.

The next section describes how the user can register an ontology with orremove (drop) an ontology from an ontology repository, such as ontologyrepository 1008 in FIG. 10, and how ontology processor 1010 in FIG. 10,extracts various information such as the class hierarchies, transitiveproperties, and implication graphs from the ontology files. The examplesuse ontologies encoded as web ontology language (OWL) files, but itshould be understood that the different embodiments are not restrictedto ontologies encoded using any specific ontology language.

The ontology repository provides a user interface for a user to manageontology files. The user supplies a unique ontology identifier (ontID)to identify each unique ontology. Each ontology may be encoded into oneor more ontology files. The ontology repository's interface allows auser to register one or more ontology files as part of an ontology, anddelete one or more of the files associated with an ontology.

For example, a user interface for an ontology repository might provide aprocedure registerOntology(ontid, ontology_File) that allows a user toregister an ontology file using a unique identifier. If the logicalontology consists of several ontology files, the user can call theregister procedure with the same ontID, for each file in the ontology.All ontology files registered with the same ontID are grouped togetherinternally for the extraction of the class hierarchies, transitiveproperties, and implication graphs. To remove a registered ontology inthe repository, the drop ontology procedure dropOntology(ontid) can beused to delete the ontology files and the extracted information filesassociated with the specified ontology ID.

Once the user has finished registering all the files associated with anontology, the ontology files are parsed to extract various pieces ofinformation, such as the class hierarchies, transitive properties, andthe implication graph. The extracted pieces of information are used tofacilitate query rewriting and processing.

Class Hierarchies

The following describes how class hierarchies may be extracted from anontology. The subclass relationship that specifies class hierarchies canbe expressed in several different ways using web ontology language(OWL). Moreover, the subclass hierarchies that are captured may notnecessarily be disjoint. Class hierarchies are extracted from ontologyfiles by an ontology processor, such as ontology processor 1010 in FIG.10, and stored in an ontology repository, such as ontology repository1008 in FIG. 10.

With reference now to FIG. 12, an example of code for constructing aWine class hierarchy is depicted. Line 1202 provides an example of how aWine class hierarchy may be constructed by explicitly specifyingsubclasses in a subClassOf construct using web ontology language.

With reference now to FIG. 13, a block diagram of a class and samplecode is illustrated. First, the Wine class hierarchy is extracted andinitially represented as shown in FIG. 1302, with DessertWine 1304 asubclass of Wine 1306.

Second, the subclass relationship is implicitly specified usingrestrictions. For example, consider the web ontology language fragmentof line 1308, where the WhiteWine class is defined to be all wines whosehasColor attribute has the value white.

The definition in line 1308 implies a subclass relationship between Wineand WhiteWine and so the corresponding edge may now be added into theclass hierarchy as illustrated in 1310, with DessertWine 1312 andWhiteWine 1314 as subclasses of Wine 1316.

Third, subclass relationships are expressed using binary set relationssuch as, for example, the intersection operator or the union operator.Line 1318 shows an example of web ontology language in whichWhiteBurgundy is defined as the intersection of Burgundy and WhiteWine.Therefore, WhiteBurgundy is a subclass of both Burgundy and WhiteWine,and the class hierarchy now appears as shown in class hierarchy 1320. Inclass hierarchy 1320, WhiteBurgundy 1322 is a subclass of Burgundy 1324,WhiteBurgundy 1326 is a subclass of WhiteWine 1328, and Burgundy 1324and WhiteWine 1328 are both subclasses of Wine 1330.

In a hierarchical diagram, such as class hierarchy 1320, each class,such as Wine 1330, is called a node, while the line between a class anda subclass is called an edge. For example, the line between Burgundy1324 and WhiteBurgundy 1322 is an edge.

In this representation, each node represents a class, while each edgerepresents a subclass-of relationship. The subclass-of relationship istransitive and so if A→B→C exists in the class hierarchy, then thefollowing subsumption statement holds for a given instance of x: (xεA)

(xεB)

(xεC).

For example, here, WhiteBurgundy 1322 is a subclass of Burgundy 1324,and Burgundy 1324 is a subclass of Wine 1330, so WhiteBurgundy 1322 istransitively a subclass of Wine 1330. Moreover, any subclass ofWhiteBurgundy 1322 will always be a subclass of Wine 1330.

Transitive Properties

Besides class hierarchies, an ontology processor, such as ontologyprocessor 1010 in FIG. 10, also extracts transitive relationships fromthe ontology and stores the transitive relationships in the form of atree to facilitate query re-writing and processing. The transitiveproperties are typically stored in XML form in an ontology repository,such as ontology repository 1008 in FIG. 10.

With reference now to FIG. 14, an example of code for specifyingtransitive properties of a Wine ontology is depicted. Code segment 1402is an example of web ontology language (OWL) code for specifying thatthe binary relationship (owl:ObjectProperty) is transitive. In thisexample, the locatedIn property relates the Thing class to the Regionclass and is defined to be transitive. Transitive means:

locatedIn(a, b) Λ locatedIn(b, c)

locatedIn(a, c)

During extraction, once the ontology processor discovers that thelocatedIn property is transitive, all instances of that property arescanned and a tree or forest is constructed from the extractedinformation. For example, code segment 1404 shows extracted instances ofthe locatedIn property. Based on the properties of code segment 1404,transitive tree 1406 may be constructed. Here, all internal nodes mustbe instances of the Region class. The leaf nodes need only be instancesof the Thing class. All the edges denote subsumption via transitivity ofthe locatedIn property.

Implications

An ontology processor, such as ontology processor 1010 in FIG. 10, alsoextracts implications from the ontology in the form of rules such as A

B. The implication rules are stored in the form of an implication graph.The implication graph is stored in an ontology repository, such asontology repository 1008 in FIG. 10. The implication graph enablesknowledge to be inferred from the data and the ontology.

An important type of implication is class subsumption. A transitive treemay be used to capture implications related to class subsumption.Implications other than class subsumptions are general implications thatdo not involve subsumption via class memberships or transitiverelationships.

There are three types of general implications: complex, conjunctive anddisjunctive implications. The ontology repository constructs and storesan implication graph for all the general implications in the ontology.The implication graph is used during query processing to rewrite thequery.

Complex Implications

Complex implications are implications where the left hand side (LHS) isa conjunction or disjunction of clauses. Take for example, the followingweb ontology language fragment, where the symbol

indicates equivalency:

(xεWhiteWine)

(xεWine)Λ(x.hasColor=White)In this example, the last implication is a complex implication, becausethe left hand side is a conjunction of the two clauses (xεWine) and(x.hasColor=White).

Conjunctive Implications

With reference now to FIG. 15, an example of a conjunctive implicationis depicted. A conjunctive implication is an implication where the righthand side (RHS) is a conjunction of clauses. A conjunctive implicationmay be part of an implication graph that is extracted by an ontologyprocessor, such as ontology processor 1010 in FIG. 10, and stored in anontology repository, such as ontology repository 1008 in FIG. 10.

In code segment 1502, all instances of the Zinfandel class also belongto the subclass of all wines whose hasColor property takes the valuered, and to the subclass of all wines whose hasSugar property takes thevalue dry:

(xεZinfandel)

[(x.hasColor=Red)Λ(x.hasSugar=Dry)]“If wine x is a Zinfandel, then wine x is red and wine x is dry.” Animportant property of conjunctive implications is that they can bedecomposed into a conjunction of simple implications.

For example, the conjunctive implication above can be decomposed intothe following:

[(xεZinfandel)

(x.hasColor=Red)]Λ[(xεZinfandel)

(x.hasSugar=Dry)]“If wine x is a Zinfandel, then wine x is red, and if wine x is aZinfandel, then wine x is dry.” By decomposing conjunctive implicationsinto a conjunction of simple implications, conjunctive implications maybe processed as a collection of simple implications that are all joinedby conjunction.

Disjunctive Implications

A disjunctive implication may be part of an implication graph that isextracted by an ontology processor, such as ontology processor 1010 inFIG. 10, and stored in an ontology repository, such as ontologyrepository 1008 in FIG. 10. A disjunctive implication is an implicationrule whose right hand side is a disjunction of clauses.

With reference now to FIG. 16, an example of a disjunctive implicationis depicted. The code segment 1602 illustrates how the implication rule:

(xεZinfandel)

(x.hasBody=Full) V (x.hasBody=Medium)“If wine x is a Zinfandel, then x has a body that is full or x has abody that is medium” may be encoded in web ontology language (OWL).

Implication Graph

An implication graph is extracted by an ontology processor, such asontology processor 1010 in FIG. 10, and stored in an ontologyrepository, such as ontology repository 1008 in FIG. 10. An implicationgraph is a directed graph consisting of two types of vertices, clauseand operator, and two types of edges, imply and operand. Clause verticesdenote clauses, such as x.hasBody=Medium, that have truth values.Operator vertices denote the conjunction or disjunction operator.

An operator vertex is also associated with a truth value that isdependent upon the clause vertices that it joins together via either aconjunction or disjunction. Imply edges denote the implicationrelationship between vertices. Operand edges associate clause verticesto operator vertices.

With reference now to FIG. 17, a block diagram of an implication graphof an illustrative embodiment is depicted. An implication graph for anontology is constructed by starting with an empty implication graph andthen scanning the ontology files for all implications. In FIG. 17,implication graph 1702 is the graphical representation for the set ofimplications 1704.

When extracting implication rules from the ontology, the ontologyprocessor filters out implications associated with class hierarchies andtransitive properties, leaving only the general implications. Theontology processor then iterates through each general implication, andclassifies the implication as either complex, conjunctive, ordisjunctive. If the implication is conjunctive, the conjunctiveimplication is further decomposed into a set of simple implications.Finally, vertices and edges corresponding to the current implication areinserted into the implication graph.

The class hierarchies, transitive properties, and implication graphs areextracted from the ontology, and then serialized into extended MarkupLanguage (XML) and stored in an ontology repository, such as ontologyrepository 1010 in FIG. 10. Serialization is the process of saving anobject onto a storage medium. The class hierarchies and transitiveproperties all contain subsumption relationships in a tree datastructure. Because the query processing component relies on XPath forsubsumption checking, the tree data is serialized in a way thatpreserves the tree structure in XML.

With reference now to FIG. 18, an example of a class hierarchy isdepicted. Tree 1802 may be encoded into the XML code 1804. Whenserializing the implication graph, subsumption testing is not needed,and so any standard method for encoding graphs to XML may be used.

With reference now to FIG. 19, a flow diagram of an ontology processor,in which illustrative embodiments may be implemented, is depicted. Anontology processor, such as ontology processor 1010 in FIG. 10,initially receives one or more ontology files and an ontology identifier(step 1902).

The ontology processor registers the ontology files with the ontologyidentifier (step 1904). The ontology processor extracts and stores theclass hierarchy or hierarchies from the ontology files (step 1906). Theontology processor extracts and stores the transitive properties fromthe ontology files (step 1908). The ontology processor extracts andstores the implication graph from the ontology files (step 1910).

Typically, the ontology processor will store the class hierarchies,transitive properties, and implication graphs extracted in steps 1906,1908, and 1910, respectively, as a combination of tables and XML data,such as OntologyDocs 1102, OntologyInfo 1104, and TransitiveProperty1106 in FIG. 11.

With reference now to FIG. 20, a flow diagram for extracting a classhierarchy, in which illustrative embodiments may be implemented, isdepicted. A class hierarchy, such as the class hierarchy depicted inFIG. 4, is extracted from an ontology (step 2002). The subclassrelationships are specified using restrictions (step 2004). The subclassrelationships are specified using binary set relations such asintersection and union operators (step 2006).

With reference now to FIG. 21, a flow diagram for extracting transitiveproperties, in which illustrative embodiments may be implemented, isdepicted. Transitive properties, such as the transitive propertiesdepicted in FIG. 6, are extracted from the ontology (step 2102). Theontology is scanned for all instances of the transitive property (step2104). A transitive tree is constructed to show the transitiveproperties of the ontology (step 2106).

With reference now to FIG. 22, a flow diagram for constructing animplication graph, in which illustrative embodiments may be implemented,is depicted. An empty implication graph is used as the starting point(step 2202).

The ontology is scanned for implications (step 2204). Implicationsassociated with class hierarchies and transitive properties are filteredout, so that only general implications are left (step 2206). One of thegeneral implications is chosen (step 2208). The implication isclassified as complex, conjunctive, or disjunctive (step 2210).

If it is determined that, yes, the implication is conjunctive (step2212), then the implication is decomposed into a set of simpleimplications (step 2214). If it is determined that, no, the implicationis not conjunctive (step 2212), or after the implication is decomposedinto a set of simple implications (2214), vertices and edges thatcorrespond to the current implication are inserted into the implicationgraph (step 2216). If there are more implications (step 2218) thenanother general implication is chosen (step 2208). If there are no moreimplications (step 2218) then the operation ends.

With reference now to FIG. 23, a base table of wine products and anassociated wine ontology is depicted in which illustrative embodimentsmay be implemented. In base table of wine products and an associatedwine ontology 2300, wine table 2302 is associated with wine ontology2304.

One way of associating wine table 2302 with wine ontology 2304 is toensure that the column names of wine table 2302 are consistent with theproperty names used in wine ontology 2304. The column names are alsoknown as the relational attributes.

Another way of associating wine table 2302 with wine ontology 2304 isfor the user to provide a mapping of the relational attributes to theassociated properties in wine ontology 2304. Each row from wine table2302 is associated with an entity in the ontology.

The above two techniques may also be combined. For example, some columnnames of wine table 2302 may be named the same as the property namesused in wine ontology 2304, while the remaining columns of wine table2302 may be associated with wine ontology 2304 by providing a mapping ofthe relational attributes to the associated properties in the ontology.

In FIG. 23, each row 2306, 2308, and 2310 of wine table 2302 isassociated with an instance of the entity wine class. For example, arrow2312 shows that the type Burgundy in row 2306 is associated with thewine class Burgundy 2314 in the class hierarchy.

Similarly, columns 2316, 2318, 2320, and 2322 of wine table 2302 may beassociated with properties in wine ontology 2304. For example, theorigin attribute, column 2318, is associated with the property locatedIn2326 of wine ontology 2304. The maker attribute, column 2320, isassociated with the property hasMaker 2328 property of wine ontology2304.

Processing a Query

A query processor evaluates the predicate of the query for every row inthe base table and returns the row or rows that satisfy the predicate.Each predicate eliminates one or more rows from the base table. If thequery processor determines that a row does not satisfy a predicate, thequery processor moves on to the next row in the base table. Typically,predicate evaluation is straightforward. Values of interest aresuccessively extracted from each row in the base table and evaluatedagainst the predicate to determine whether the predicate is satisfied.

Referring now to FIG. 24, a flow diagram of a processor in whichillustrative embodiments may be implemented, is depicted. A queryprocessor, such as virtual view query processor 312 in FIG. 3, receivesa query from a user. The query processor rewrites the query so that thequery may be processed by a relational-xml hybrid database, such asquery engine 314 in FIG. 3.

If the query may be answered using data contained in a base table, suchas base table 308 in FIG. 3, then the query is processed conventionally,that is, the query is processed as a query to a relational database.However, if the query requires an inference using the ontology, then thequery processor rewrites the query. FIG. 24 illustrates the steps thequery processor takes when rewriting the query.

Given a query with a predicate involving attributes not in the basetable, the query is rewritten in two stages. First, the query predicateis expanded using the implication graph (steps 2402-2408). Second, eachclause is rewritten to include subsumption checking (steps 2410-2414).

The process of rewriting the query begins by searching the implicationgraph for matching sub-graphs (step 2402). Implication graph 1702 inFIG. 17 is an example of an implication graph. The query predicate mayconsist of a single clause, or multiple clauses.

If the query predicate in step 2402 consists of a single clause q, thenthe implication graph is searched for the vertex for q. The querypredicate is then rewritten as follows. From the vertex for q, alldependent clauses, that is, all vertices are enumerated from the graph,and the query predicate is rewritten as a disjunction of the originalpredicate and all of its dependent clauses (step 2404).

If the query predicate in step 2402 consists of multiple clauses joinedby a disjunction or a conjunction, then the implication graph issearched for a collection of matching sub-graphs. For each matchingsub-graphs, the implication graph is traversed starting from thesub-graph and all the dependent clauses are retrieved (step 2404).

Once all the dependant clauses of the matching sub-graph have beenenumerated, any duplicate dependent clauses are eliminated by keepingtrack of which vertices have been traversed before and removingduplicate traversals (step 2406).

Once a sub-graph has been processed, the query processor checks whetherthere is another matching sub-graph (step 2408). If the answer is “yes”and there is another sub-graph, then the query processor goes back andrepeats the previous steps (steps 2404 and 2406) until all sub-graphshave been processed. If the answer is “no” and there are no moresub-graphs, then the predicate is rewritten as a disjunction of thepredicate itself and all the dependent clauses (step 2410).

Next, each dependant clause in the expanded predicate is rewritten usinga subsumption predicate. After the query expansion, the query predicateis a Boolean expression of multiple clauses. Each clause is examined todetermine if the clause contains a subsumption relationship. If a clausehas a subsumption relationship, the clause is rewritten to include asubsumption predicate (step 2412).

If there is another clause in the expanded query, the previous step isrepeated (step 2414). If there are no more clauses then the processends.

With reference now to FIG. 25, an example of a query is depicted inwhich illustrative embodiments may be implemented. For query 2502, thetwo relevant implications from the ontology are implications 2504. Query2502 may be expanded using the implications 2504 into expanded query2506.

After expansion, each clause is examined to determine whether there isan associated subsumption, and if there is a subsumption, the clause isrewritten. One way of performing subsumption checking is by using theXpath and the SQL/XML function XMLExists.

With reference now to FIG. 26, an example of a query is depicted inwhich illustrative embodiments may be implemented. Query 2602 is a queryagainst Table 2, in which locatedIn is a virtual column created from thetransitive closure of W.origin. Query 2602 may be rewritten as query2604.

In query 2604, XMLExists(T.tree//USRegion//W.origin) performssubsumption checking using the Xpath and the SQL/XML function XMLExists.Of course, those versed in the art will appreciate that other, similar,ways of performing subsumption checking may be used instead of theSQL/XML function XMLExists.

With reference now to FIG. 27, an example of a query is depicted inwhich illustrative embodiments may be implemented. Referring again toTable 2, query 2702 is a query against Table 2. Assume that theimplication graph of the ontology contains implications 2704 relevant to“hasColor=red”. Assume also that class hierarchy 2706 is a portion ofthe class hierarchy in the ontology.

The query predicate contains a constraint on hasColor, and hasColor is avirtual column that is not in the base table, therefore the query isexpanded using implications 2704 to create expanded query 2708. The typeattribute B.type is associated with a recursive type hierarchy in theontology, and so subsumption checking is applied to create rewrittenquery 2710.

Processing rewritten query 2710 on Table 2 using the relational-XMLhybrid database, the row for CotesDOr will satisfy the query because itis a RedWine, and the row for Zinfandel will also satisfy the query. TheisSubsumed( ) function is implemented using the SQL/XML functionXMLExists because the class hierarchies are encoded as an XML tree.

Thus, a user can formulate an ontology-based query, similar to aconventional relational query, and have it answered. By rewriting theuser's query in the manner described above, and by using informationextracted from the ontology, such as the implications, transitiveproperties, and class hierarchies, the user's query can be used to inferknowledge not in the relational base tables.

The different embodiments provide a method, apparatus, and computerprogram product for querying data in a database. An ontology isassociated with the data. Responsive to receiving a query from arequestor, relational data in the database is identified using the queryto form identified relational data. Ontological knowledge in theontology is identified using the identified relational data and theontology. A result is returned to the requestor.

The flowchart and block diagrams in the figures illustrate thearchitecture, functionality, and operation of some possibleimplementations of systems, methods and computer program productsaccording to various embodiments. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved.

The invention can take the form of an entirely hardware embodiment, anentirely software embodiment or an embodiment containing both hardwareand software elements. In a preferred embodiment, the invention isimplemented in software, which includes but is not limited to firmware,resident software, microcode, etc.

Furthermore, the invention can take the form of a computer programproduct accessible from a computer-usable or computer-readable mediumproviding program code for use by or in connection with a computer orany instruction execution system. For the purposes of this description,a computer-usable or computer readable medium can be any tangibleapparatus that can contain, store, communicate, propagate, or transportthe program for use by or in connection with the instruction executionsystem, apparatus, or device.

The medium can be an electronic, magnetic, optical, electromagnetic,infrared, or semiconductor system (or apparatus or device) or apropagation medium. Examples of a computer-readable medium include asemiconductor or solid state memory, magnetic tape, a removable computerdiskette, a random access memory (RAM), a read-only memory (ROM), arigid magnetic disk and an optical disk. Current examples of opticaldisks include compact disk-read only memory (CD-ROM), compactdisk-read/write (CD-R/W) and DVD.

A data processing system suitable for storing and/or executing programcode will include at least one processor coupled directly or indirectlyto memory elements through a system bus. The memory elements can includelocal memory employed during actual execution of the program code, bulkstorage, and cache memories which provide temporary storage of at leastsome program code in order to reduce the number of times code must beretrieved from bulk storage during execution.

Input/output or I/O devices (including but not limited to keyboards,displays, pointing devices, etc.) can be coupled to the system eitherdirectly or through intervening I/O controllers.

Network adapters may also be coupled to the system to enable the dataprocessing system to become coupled to other data processing systems orremote printers or storage devices through intervening private or publicnetworks. Modems, cable modem and Ethernet cards are just a few of thecurrently available types of network adapters.

The description of the present invention has been presented for purposesof illustration and description, and is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the art. Theembodiment was chosen and described in order to best explain theprinciples of the invention, the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated.

1. A computer implemented method for querying relational data andsemi-structured ontology data in a database, the computer implementedmethod comprising: associating ontological data in an ontology withrelational data in a database; creating a virtual view comprising actualcolumns and virtual columns, wherein values in the actual columnscomprise the associated relational data from the database and values inthe virtual columns comprise the associated ontological data from theontology, wherein the associated ontological data in the virtual columnsis not materialized and only inferred when a query is made that requiresthat values be derived; receiving a query from a requestor requiring aninference of virtual view data in the virtual columns; and returning aresult of the query, wherein the result comprises relational data andmaterialized values derived in real time from the virtual view data inthe virtual columns.
 2. The computer implemented method of claim 1,wherein associating ontological data in the ontology with the relationaldata in the database comprises: extracting class hierarchies,implication rules, and transitive properties from the ontology; andencoding the class hierarchies, implication rules, and transitiveproperties to form encoded class hierarchies, encoded implication rules,and encoded transitive properties.
 3. The computer implemented method ofclaim 2, wherein the result is determined using at least one of theencoded class hierarchies, the encoded implication rules, or the encodedtransitive properties.
 4. The computer implemented method of claim 2,wherein the database stores the relational data using relational tablesand stores the encoded class hierarchies, encoded implication rules, andencoded transitive properties using extended markup language.
 5. Thecomputer implemented method of claim 1, further comprising: specifying arelationship between a subset of the relational data and a subset of theontological data, wherein the virtual view displays the subset of therelational data and the subset of the ontological data as a relationaltable.
 6. The computer implemented method of claim 5, wherein the resultis determined using the relationship between the subset of therelational data and the subset of the ontological data.
 7. A databasemanagement system for querying relational data and semi-structuredontology data in a database, the database management system comprising:a bus; a storage device connected to the bus, wherein the storage devicecontains computer usable code and the data; a communications unitconnected to the bus; and a processing unit connected to the bus forexecuting the computer usable code, wherein the processing unit executesthe computer usable code to associate ontological data in an ontology iswith relational data in a database; create a virtual view comprisingactual columns and virtual columns, wherein values in the actual columnscomprise the associated relational data from the database and values inthe virtual columns comprise the associated ontological data from theontology, wherein the associated ontological data in the virtual columnsis not materialized and only inferred when a query is made that requiresthat values be derived; receiving a query from a requestor requiring aninference of virtual view data in the virtual columns; and return aresult of the query, wherein the result comprises relational data andmaterialized values derived in real time from the virtual view data inthe virtual columns.
 8. The database management system of claim 7,wherein associating ontological data in the ontology with the relationaldata in the database comprises: extracting class hierarchies,implication rules, and transitive properties from the ontology; andencoding the class hierarchies, implication rules, and transitiveproperties to form encoded class hierarchies, encoded implication rules,and encoded transitive properties.
 9. The database management system ofclaim 8, wherein the result is determined using at least one of theencoded class hierarchies, the encoded implication rules, or the encodedtransitive properties.
 10. The database management system of claim 7,wherein the requestor specifies a relationship between a subset of therelational data and a subset of the ontological data, wherein thevirtual view displays the subset of the data and the subset of theontology as a relational table.
 11. The database management system ofclaim 10, wherein the result is determined using the relationshipbetween the subset of the relational data and the subset of theontological data.
 12. (canceled)
 13. The database management system ofclaim 8, wherein the relational data is stored in the database usingrelational tables, and the encoded class hierarchies, the encodedimplication rules, and the encoded transitive properties are stored inthe database using extended markup language.
 14. A computer programproduct comprising a computer usable medium including computer usableprogram code for querying relational data and semi-structured ontologydata in a database, the computer program product comprising: computerusable code for associating ontological data in an ontology withrelational data in a database; computer usable code for creating avirtual view comprising actual columns and virtual columns, whereinvalues in the actual columns comprise the associated relational datafrom the database and values in the virtual columns comprise theassociated ontological data from the ontology, wherein the associatedontological data in the virtual columns is not materialized and onlyinferred when a query is made that requires that values be derived;computer usable code for receiving a query from a requestor requiring aninference of virtual view data in the virtual columns; and computerusable code for returning a result of the query, wherein the resultcomprises relational data and materialized values derived in real timefrom the virtual view data in the virtual columns.
 15. The computerprogram product of claim 14, wherein the computer usable code forassociating ontological data in the ontology with the relational data inthe database comprises: computer usable code for extracting classhierarchies, implication rules, and transitive properties from theontology; and computer usable code for encoding the class hierarchies,implication rules, and transitive properties to form encoded classhierarchies, encoded implication rules, and encoded transitiveproperties.
 16. The computer program product of claim 15, wherein theresult is determined using at least one of the encoded classhierarchies, the encoded implication rules, or the encoded transitiveproperties.
 17. The computer program product of claim 14, furthercomprising: computer usable code for specifying a relationship between asubset of the relational data and a subset of the ontological data,wherein the virtual view displays the subset of the relational data andthe subset of the ontological data as a relational table.
 18. Thecomputer program product of claim 17, wherein the result is determinedusing the relationship between the subset of the relational data and thesubset of the ontological data.
 19. (canceled)
 20. The computer programproduct of claim 15, wherein the relational data is stored usingrelational tables, and the encoded class hierarchies, the encodedimplication rules, and the encoded transitive properties are stored inthe database using extended markup language.
 21. The computerimplemented method of claim 1, further comprising: receiving a queryfrom a requestor not requiring an inference of virtual view data in thevirtual columns; returning a result of the query, wherein the resultcomprises relational data.
 22. The database management system of claim7, wherein the processing unit further executes the computer usable codeto receive a query from a requestor not requiring an inference ofvirtual view data in the virtual columns; and return a result of thequery, wherein the result comprises relational data.